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Abstract 


In this paper, we propose an explicit closed-form Bayes factor for the problem of 
two-sample hypothesis testing. The proposed approach can be regarded as a Bayesian 
version of the pooled-variance t-statistic and has various appealing properties in prac¬ 
tical applications. It relies on data only through the t-statistic and can thus be 
calculated by using an Excel spreadsheet or a pocket calculator. It avoids several un- 
de sirable parad o xes, w hich may be encountered by the previous Bayesian approach 
of lConen et al.l ((20051). Specifically, the proposed approach can be easily taught in 
an introductory statistics course with an emphasis on Bayesian thinking. Simulated 
and real data examples are provided for illustrative purposes. 
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1 Introduction 


In an introductory statistics course, we usually teach students how to conduct a hypothesis 
test based on independent samples to compare the means of two populations with equal, 
but unknown variance. Let Uij be random samples drawn from independent and normally 
distributed populations with means /r, and common variance for j = I,-- - and 
i = 1,2. We are interested in testing 

Hq-. ^1 = P 2 versus Hi', /i 2 . (1) 


Within a frequentist framework, the pooled-variance two-sample t test is commonly used 
for the above hypothesis testing. The test statistic is given by 


t = 


yi - y2 

\/ '^s 


( 2 ) 


where yi = YTjLiyij/^i and 


2 _ (rii - l)s| + (n 2 - l)s| 
Sp - 


(3) 


ni+n2-‘l 

is the pooled-variance estimate of with s} = YTj=i{yij ~ ViY/ini — 1) for i = 1,2. 
Here, ns = (1/ni -|- 1 / 77 - 2 )“^ is often called the “effective sample size” in the two-sample 
experiment. At the a signihcance level, we obtain the critical value ti_a/ 2 ,v or P-value 
p = 2P{T > |t|) with degrees of freedom v = ni + n 2 — 2, where fi_a/ 2 ,^ is the (1 — a/2) 
quantile of distribution and T h as the T„ dist ribution. We reject the null hypothesis Hq 
if either |f| > ti_ai 2 ,v or p < a; see IWeissI (1201211 . 

Bayesian approaches to hypothesis testing have recently received consi derable attentio n 
and are be coming importan t in c 


ifferent disciplines, such as socio l ogy (Western! . 1199911 . 
economics (IFernandez et al.l. 1200111 . and psychology (IRouder et al.l. 1200911 . Many recent 


studies suggest that we should offer at least one course about Bayesian methods 
ents a t early stages i n the ir mathematics and sta t istics education; see, f or exa mple. 


(1199711 . 


.0 stu- 


Albert 


Gonen et al 


(1200511 . 


Wetzels et al 


ers. Specihcally, as stated by 


(I 2 OI 2 II. IWulff and RobinsonI (1201411 . among oth- 
Carlin and LouiJ (20001. “The Bayesian approach to statisti¬ 


cal design and analysis is emerging as an increasingly effective and practical alternative to 
the freguentist one.'''' Such a course will not only motivate students’ interests in Bayesian 
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thinking, but also help them know how to formulate Bayesian methods in simple statistical 
scenarios, such as the hypothesis testing in ([T]). More importantly, it will make students 
ready to use both Bayesian and frequentist ideas. 

A natural approach within a Bayesian framework to compar e hypotheses is th e Baye s 


factor (ratio of the marginal de nsities of the two m odels): see iKass and Raftervi (1199511 . 
For the hypothesis testing in flT]). iGonen et al.l (1200511 proposed a simple closed-form Bayes 


{l + nsal) 


(4) 


factor based on the two-sample t-statistic and it is given by 

r 1 + tVn 1 

GB¥[H, ■. H,]{al) = ^ 

ll + t^/[v[l + nsal))\ 

where is a hyperparameter of the prior that needs to be specihed. The choice of prior 
distributions for deriving the GBF will be stated in detail in the following section. The 
GBF in (jl]) shows a close relationship between frequentist and Bayesian ideas and can be 
easily covered in an elementary statistics course. Note that the choice of is critical, 
because it acts as an inverse prior sample size. Specihcally, the GBF with hxed a1 may 

Die features, such as Bartlett’s paradox and the information paradox; 


exhibit some undesira 
see 


Liang et al.l (1200811 . These paradoxes will dehnitely confuse students and even make 


them struggle when conducting Bayesian data analysis. 

In this paper, we specify a hyper-prior for the hyperparameter to reduce the impact 
of misspecihed hyperparameter values. The prior will still result in an explicit expression 
of the Bayes factor based on the two-sample t-statistic. It is shown that the proposed 
approach resolves several potential d ifficulties and paradoxes encountered by the previous 


approach due to 


Gonen et ah 


200511 . We hope that our results will facilitate an intu¬ 


itive understanding and discussion of the relationship between frequentist and Bayesian 
ideas, but also shed some light on the importance of hyper-prior specihcations to students, 
teachers, and researchers. 


The remainc 
Bayes factor of 


er of this paper is o rganized as follows. In Section [2l we review the existing 
(1200511 and discuss potential difficulties associated with hxed 


Gonen et af 


hyperparameter values. In Section 121 we specify a hyper-prior on that hyperparameter, 
which yields a closed-form expression for the Bayes factor. We investigate the hnite sample 
performance of the two Bayesian procedures in a simulation study (Section [3]) and a real- 
data example (Section 0]). Some concluding remarks are given in Section [51 with derivation 
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of the proposed procedure in the appendix. 


2 Bayes inference 


The Bayesian analysis begins with prior specihcations for the unknown parameters. Let 
p(Y I dj) and vTj be the likelihood function of Y and the prior probability on hypothesis 
Hj (tto + TTi = 1) for j = 0,1, respectively. From Bayes theorem, the posterior probability 


of Hi is dehned as 


P{H, I Y) = 


TTjrrijiY) 


7romo(Y) + 7rimi(Y)’ 
The corresponding marginal likelihood of Y given Hj is 

'mj{Y) = [p{Y I 9j)7ij{9j)d9j, 


(5) 


( 6 ) 


where T^j{9j) is the prior for the unknown parameter 9j under Hj for j = 0,1. The posterior 
probability of Hi can be expressed as 


P{Hi I Y) = 


7iiBF[Hi : Ho 


1 + 


TTo 


TTi BF[Tri : Ho 


(7) 


TTo + 7riBF[iLi : Ho] 
where the Bayes factor, BF[iLi : Ho], for comparing Hi to Hq is given by 

The hypoth esis H^ ( Ho) is more likely to be selected when BF[iLi : Ho] > 1 (< 1). More 
specihcally, Ijeffrevsl (1196111 suggested that BF[i7i : Ho] < 0.1, provides “strong” evidence 
in favor of Ho, and BF[Fri ; Ho] < 0.01, provides “decisive” evidence. Note that the Bayes 
factor for the null relative to the alternative, denoted by BF[iLo : Hi], is given by 

^ "J = 

jl em in ([1]), we need to specify appropriate prior distributions 
(120051) show that this testing problem can be written in 


For the hypothes is testing pro 


for 


Gonen et ah 


equivalent form as 


Ho : S = Pi — fi 2 = 0 versus Hi : 6 ^ 0. 


(9) 
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Therefore, they advocate a prior for (5/cr^, instead of /i, where /i = (/xi + /i2)/2. After 
reparameterization from {/j,i, fi 2 , to (/r, h, a^), the snggested priors are given by 


7r(/i, (T^) oc 1/cr^ and 6/a \ ^ 0 ^ N(^X, al), 


( 10 ) 


where A and a^ are the hyperparameters that need to be pre-specihed. Dne to lack of prior 
knowledge in practice, it is natural to set A = 0 to reflect the uncertain direction of an 
effect. Thus, the case for which A = 0 will be of interest to us in what follows. The Bayes 
factor under the above priors is 

{v+l)/2 


GBF[H, : Ho]{al) = 


1 + r/v 


(1 + nsa^) 




( 11 ) 


l + t‘^/{v{l + nsal)), 

where v = rii + n 2 — 2. Note that the Bayes factor depends on the data only through 
the t-statistic and can often be calculated using a pocket calculator. As mentioned in the 
Introduction, the choice of is quite critical, and in particular, the Bayes factor with 
hxed cr^ may lead to several undesirable properties, such as Bartlett’s paradox and the 
information paradox, briefly summarized as follows. 

Bartlett’s paradox: Because the hyperparameter af reflects the variance of the univariate 
normal distribution in ffTOj) . a large value of a^ is often chosen to minimize prior information. 
However, when becomes sufficiently large, while v is hxed {ns is also hxed), the GBF 
tends to 0, indicating that it always favors the null hypothesis, regardless of the information 
lis phenomenon is ofte n called Bart l ett’s p aradox, which has been studied 


fro m the data. T 

by iJehrevsl fjl96l[) and more recently by 


Liang et ah 




Information paradox: Suppose that samples are generated under Hi. In this setting, when 
V is hxed, the posterior probability of Hi should be higher than the one for Hq when the 
f-statistic goes to inhnity. We thus expect that the GBF tends to inhnity as the information 
against Hq accumulates. However, with a hxed value of a^, the GBF becomes a constant 
(1 + nsa‘/^)'^^‘^ as f —?■ oo. This is referred to as the information paradox. 

The two paradoxes may confuse students and even make them struggle about Bayesian 
data analysis, especially when we introduce basic ideas of Bayesian inference in an elemen¬ 
tary level. In this paper, we advocate a hyper-prior for a"/, which not only alleviates the 
impacts of misspecihed hyperparameter, but also yields an explicit Bayes factor. More im- 
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portantly, the proposed approach is still a function of the two-sample f-statistic and enjoys 
various appealing properties, as discussed next. 


2.1 The hyper-prior for cr, 


In this section, we consider a proper prior for denoted by 7r(cr^). The proposed Bayes 
factor can be written as 


PBF[iJi : hfo] = [ 
Jo 


1 + t^/v 


h+i)/2 


{l + ngaD (12) 


l + t‘^/{v{l + nsal))^ 

The prior for is assigned to be the Pearson type VI distribution with shape parameters 
a>—1, 6>—1, and scale parameter k > 0. Its probability density function (pdf) is 


2. Kal) “ ^ 2 

TTpr!) = ° - J{0,oc)K), 


Bi^Oj -l-1,6 -|-1) 


(13) 


where is a beta function. This prior has also been used bv IWang and SnnI (120141) 


in the one-way random effects model. With the particular choice of k = ns and b = 
(n -I- l)/2 — a — 5/2, the Bayes factor can be greatly simplihed as 

P(n/2)P(a + 3/2) 


PBF[ili : Ho] = 


,2 \ {v-2a-2)l2 

1 + - 
V 


(14) 


r((t.+ i)/ 2 )r(a + i) 

which is an explicit expression and can thus be easily computed using an Excel spreadsheet 
or a simple calculator. Such an expression is unavailable for other choices of k and b. Like 
the GBF in (1111) . it can be regarded as a Bayesian version of the t-statistic; in addition, 
our approach enjoys several appealing properties, which are not shared by the GBF. The 
proof of the theorem is straightforward and is thus omitted here for simplicity. 


Theorem 1 In the setting of the information paradox mentioned above, the Bayes factor 


in p/j) tends to infinity when —1 < a < v/2 — 1. 


The theorem shows that when —l<a<n/2 — 1, the specihed hyper-prior provides a 
resolution of the information paradox that aries in the GBF. In the case of minimum sample 
sizes of the two samples (i.e., ni -|- n 2 = 3), we have v = 1, indicating that a G (—1, —1/2). 
Of particular note is that when a = —1/2, the asymptotic tail behavior of 

poo 

7i{6/a \ fr,a^,5 jlO) = / N{6/a \ X, al)7i{al) dal 

Jo 
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becomes the Cauchy density for sufficiently large 5/(j, which provides a flat tail behavior 
and diminishes the prior influence of nidi a \ 7 ^ O), especially when a is small. 

Consequently, we recommend a G (—1,—1/2]. 

It deserves mentioning that the prior depends on the sample size and that as the sample 
size increases, the prior has a density in the right tail that behaves like leading 

to a fat tail for small value of a. Furthermore, it can be seen from Figure [T] that a higher 
prior probability is assigned to the event > 1. This phenomenon occurs because the 
parameter seems to act as an inverse prior sample size. A small value of (such as 
0) niakes the prior converge to a point mass at 5 = 0, and the alternative Hi may 
collapse to Hq. We thus obtain that the Bayes factor (the GBF) tends to 1, indicating that 
both hypotheses are equal descriptions to the data in the limit. 



Figure 1: The hyper-prior for with k = ns, a = —3/4, and b = {v + l)/2 — a — 5/2 for 
different choices of ni and 77 , 2 . 

To see how the PBF avoids Bartlett’s paradox and the information paradox, we consider 
two simple examples with ni = n 2 = 10: one with a hxed t-statistic, and the other with an 
increasing value. Suppose that t = 5, providing strong evidence against Hq. We observe 
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Figure 2: The Bayes factor as a function of the hyperparameter (left: the GBF; right: the 
PBF) when rii = n 2 = 10. 

from Figure[2]that the PBF with a G (—1, —1/2] always rejects Hq, while the GBF fails to 
reject Hq when (Tq becomes large, regardless of the information from the data. Also, it is 
well-known that the larger the t-statistic, the stronger the evidence against Hq. Figure [3] 
shows that as the t-statistic increases, the PBF grows faster than the GBF, which tends to 
a constant, even though t becomes signihcantly large. These two examples show that the 
PBF not only avoids these paradoxes, but also provides a way to enhance students’ better 
understanding of these paradoxes. 


3 Simulation study 

In this section, we conduct simulation studies and sensitivity analysis to investigate the h- 
nite sample performance of the two Bayes factors (GBF and PBF) with various choices 
of their corresponding hyperparameters. For sample 1, we generate ni random vari¬ 
ables normally distributed with mean 0 and standard deviation 1. For sample 2, we 
generate n 2 random variables normally distributed with mean 6 and standard deviation 
1, where 6 ranges from —4 to 4 in increments of 0.1. To assess the sensitivity of the 







t 


Figure 3: The Bayes factor as a function of the t-statistic (left: the GBF with aa = .1; 
right: the PBF with a = —.75) when t = 5 and rii = n 2 = 10. 

hyperparameters, we take cXa = {0.1,1/3, 0.5,1,1.2, 2, 5} for the GBF in (ITT]) and a = 
(—0.95, —0.9, —0.8, —0.75, —0.7, —0.6, —0.5} for the PBF in f[TT|) . For each case, we ana¬ 
lyze 10,000 simulated datasets with various choices of rii and 77 - 2 . The decision criterion 
used in this paper is to choose Hi if the Bayes factor > 1 and Hq otherwise. 

The relative frequencies of rejecting Hq under the three different choices of sample size 
are depicted in Figures 0],[5], andO Rather than providing exhaustive results based on these 
simulations, we merely highlight the most important hndings from the three hgures. (i) 
The GBF is quite sensitive to the choice of the hyperparameter a a, even when the sample 
size is large. For instance, when ui = 77,2 = 100 and 5 = —0.3, the frequency of rejecting Hq 
changes from 0.8479 to 0.2843 with aa increasing from 0.1 to 5. (ii) The PBF is relatively 
insensitive to the hyperparameter a, and when the sample size is large, the PBF behaves 
similarly for all values, (hi) We observe that under Hq (i.e., 5 = 0), the relative frequency 
of rejecting Hq varies greatly for the GBF with different choice of Ua, whereas the PBF is 
quite stable in terms of different value of a. 

We now compare the performance of the two Bayes factors with the P-value based 
on the t-statistic in ([2|) when a = 0.05. Based on the same simulation scheme described 
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ni =n2=10 


Hi = n2=10 




Figure 4: The relative frequency of rejection of Hq under different procedures (left: the 
GBF; right: the PBF) when Ui = 77,2 = 10. 


ni = n2=30 


Hi = n2=30 
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Figure 5: The relative frequency of rejection of Hq under different procedures (left: the 
GBF; right: the PBF) when Ui = n 2 = 30. 


10 






























=n2=100 


n-^ = n2=100 



o 




I 

ff 


CO 

1 



o 

i 


T 


1 

l:l 

"5 



J’li 






o 

l-.j 


— 



|,ii 

o 



i;i| 



1;''. 





■ 

3 


li'i 

i 

O 


li;; 

/■■i! 








- Oa = 0.1 

li" '-I 


CM 

---- <J.= 1/3 

iii’ ’’ll' 



<5a = 0.5 

ii', 

' 'll 



- - - Oa = 1 





<5a = 1.5 

'■i 




-- Oa-2 


! 


O 









-4 -2 


-4 -2 0 2 4 


5 


§ 


Figure 6: The relative frequency of rejection of Hq under different procedures (left: the 
GBF; right: the PBF) when ni = n 2 = 100. 


above, we consider the GBF with a a = 1/3, suggested by iGonen et ahl (120051) and the 
PBF with a = —3/4. Figure [7] depicts the numerical hndings with different sample sizes. 
We observe that the PBF and the P-value have similar performances, whereas they are 
signihcantly different from the GBF. As expected, when the sample size becomes large, the 
three procedures behave very similarly. In addition, the PBF has a faster decreasing rate to 
zero than the two other methods, in terms of the relative frequency of rejecting Hq. Thus, 
we may conclude that the PBF is consistent under Hq when the sample size approaches 
inhnity. This property is not shared by the two other methods under consideration. 


4 A real-data application 

We compare the performance of the two Bayes factors via a real-data example available at 
The Data and Story Library, {http : //lib.stat.cmu.edu/DAS L/Data files/Calcium.html). 
The data consist of the blood pressure measurements for 21 African-American men: 10 of 
the men took calcium supplements and 11 took placebos. We are interested in testing if 
increasing calcium intake reduces blood pressure. The pooled-variance f-statistic is 1.634, 
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with the two-sided P-value of 0.1187. The positive t-statistic indicates intake of calcium is 
beneficial for reducing blood pressure, and the P-value shows that the null hypothesis that 
calcium has no effect is more likely at the 5% significance level. 

To fully specify the Bayesian approach, we need to choose appropriate priors for the un¬ 
known parameters. Due to lack of prior knowledge, we consider ttq = tti = 1/2. Therefore, 
for decision-making, the hypothesis Hi is more likely to be selected if P{Hi \ Y) > 1/2, or 
equivalently, the value of the Bayes factor is larger than 1. 


(7a 

1/10 

1/3 

1/2 

1 

1.5 

2 

5 

GBF[i7i : Hii] 

1.307 

1.264 

1.358 

1.193 

0.934 

0.746 

0.321 

P{Hi 1 Y) 

0.509 

0.558 

0.576 

0.544 

0.483 

0.427 

0.243 


Table 1: Numerical summaries of the GBF with different choice of aa- 


Gonen et ah 


(120051) analyze this dataset by using the GBF with aa = 1/3 and obtain 


that the null hypothesis is less likely because P{Hi \ Y) = 0.558. From a practical view¬ 
point, we shall be interested in a sensitivity analysis of the hyperparameter aa- Numerical 
results are reported in Table [H We observe that as aa increases, the GBF decreases. When 
aa > 1, the GBF tends to favor Hq, whereas it tends to reject Hq when aa < 1. The 
corresponding posterior probability changes from 0.509 (against Hq) to 0.243 (against Hi) 
when aa changes from 1/10 to 5. This observation shows that the GBF is quite sensitive 
to the choice of aa and that different choice of aa may lead to a contradiction in a decision¬ 
marking process. We now employ the PBF with different values of a G (—1, —1/2]. It can 
be seen from Table [2] that the PBF is quite robust to the choice of a and leads to the same 
decision. In addition, the conclusion based on the PBF is coincident with the one based 
on the two-sided P-value. 


a 

-9/10 

-4/5 

-3/4 

-7/10 

-3/5 

-1/2 

PBF[i7i : Ho] 

0.177 

0.316 

0.375 

0.429 

0.534 

0.606 

P{Hi 1 Y) 

0.150 

0.240 

0.273 

0.300 

0.344 

0.377 


Table 2: Numerical summaries based on the PBF with different choice of a. 


13 











5 Concluding remarks 


In this paper, we propose an explicit closed-form Bayes factor for testing the difference 
between two means from two separate gronps of snbjects. The proposed approach enjoys 
several appealing properties. It relies on data only through the classical t-statistic and can 
thus be easily calculated using a si mple calculator. It a voids several undesirable properties 


encountered by the approach due to 


Gonenetah 


(120051) . More importantly, it can be easily 


taught in elementary statistics with an emphasis on Bayesian thinking. We hope that the 
results of this paper will not only facilitate an intuitive understanding of the relationship 
between frequentist and Bayesian ideas, but also shed some light on the importance of 
hyper-prior specifications to students, educators, and researchers. 


6 Appendix 

Derivation of equation (I14|) : When we consider the Pearson type VI distribution with 
K = ns, the Bayes factor in flT^ can be expressed as 

1 + t^/v 1 


PBF[H, : Ho] = 


ns 


B[a -|- 1, fe 1) Jo 


l + B/{v{l + nsal)) 


{nsal)\l + nsal) “ ^ dal. 


With the transformation r = nsal and b = (v + l)/2 — a — 5/2, it follows 


PBF[iPi : Ho 


B(a -|- 1, & -(- 1) Jo 

( 1 rc 

B{a -|- 1, fe -|- 1) Jo 

+ rc 

B{a -I- 1, fe -|- 1) Jo 
B(a -|- 1, 6 -|- 1) Jo 

1 n 

B(a + l,b + 1) Jo 


1 + tyv 


1 -F B/ (n(l r)) 


r^(l + dr 


1 + 




V 1 + T 


-(v+l)/2 


r^(l + 


l + T + 

1 + T + 

T 

1 + 


-iv+l)/2 


^(l+^)h+l)/2-a-fe-5/2 


1 -|- B/v 


T 

-{v+l)/2 

dr since b = {v + l)/2 — a — 5/2 

-{v+l)/2 

dr. 
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With the transformation x = t/{1 + f^/v), it follows 

(1 + 


FBF[Hi : Ho 


because of & = (n + l)/2 


-B(ci + 1,6 + 1) Jo 

B{b+l,{v + l)/2-b 


{1 + 

1 ) 


B[a + 1 , 6 + 1 ) 

r(n/2)r(a + 3/2) 


1 + ^! 

V 


b+l 


iH 

V 


2 \ (v—2a—2)l2 


r((n + l)/2)r(a + l) 

5/2. This completes the proof. 
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